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ABSTRACT 


a  person  express  his  uncertainty  about  an  event  E  ,  conditional 
upon  an  event  F  ,  by  a  number  x  and  let  him  be  given,  as  a  result, 
a  score  which  depends  on  x  and  the  truth  or  falsity  of  E  when  F 
is  true.  It  is  shown  that  if  the  scores  are  additive  for  different 
events  and  if  the  person  chooses  admissible  values  only,  then  there 
exists  a  known  transform  of  the  values  x  to  values  which  are 
probabilities.  In  particular,  it  follows  that  values  x  derived 
by  significance  tests,  confidence  intervals  or  by  the  rules  of 
fuzzy  logic  are  inadmissible.  Only  probability  is  a  sensible  des¬ 
cription  of  uncertainty. 
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SCORING  RULES  AND  THE  INEVITABILITY  OF  PROBABILITY 


by 

Dennis  V.  Lindley 

Introduction 

Suppose  that  a  person,  considering  an  event  E  about  which  he 
is  uncertain,  describes  that  uncertainty  by  a  number  x  .  De  Finetti’s 

2 

(1974)  basic  argument  is  that  if  the  person  is  scored  an  amount  (x  -  1) 

2 

if  E  is  true  and  x  if  E  is  false,  and  if  the  scores  for  different 
events  are  additive,  then  x  must  be  a  probability  for  E  .  This 
result  has  been  generalized  to  some  other  scores  besides  the  quadratic 
one:  a  seminal  paper  is  that  by  Savage  (1971)  which  contains  several 
references-  In  the  present  paper  we  show  that  De  Finetti's  argument 
applies  to  virtually  every  reasonable  score  function  with  the  only  modi¬ 
fication  that  a  known  transform  of  x  ,  rather  than  x  itself,  must  be 
a  probability.  The  argument  may  be  viewed  as  providing  another  axiomatic 
justification  for  the  Bayesian  position,  advantages  being  the  simplicity 
both  of  the  assumptions  and  of  the  proof.  It  also  demonstrates  that  any 
description  of  uncertainty  by  numbers  that  do  not  obey  the  rules  of  the 
probability  calculus,  even  after  transformation,  will  violate  the  simple 
assumptions  we  make.  Examples  of  such  non-probabilistic  assignments  are 
significance  levels,  confidence  statements  and  possibilities  in  fuzzy 
logic.  The  argument  is  extended  to  where  the  description  is  by  means  of 
two  numbers,  perhaps  upper  and  lower  probabilities,  as  suggested  by 
Dempster  (1968)  and  Smith  (1961),  to  demonstrate  that  these  are  in  dis¬ 
agreement  with  the  assumptions.  The  message  is  essentially  that  only 
probabilistic  descriptions  of  uncertainty  are  reasonable. 
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I  aa  graceful  to  Richard  E.  Barlow  for  Inviting  me  to  Berkeley  and 
to  L.  A.  Zadeh  who  asked  me  to  give  a  seminar  on  the  relationship  between 
probability  and  the  ideas  of  fuzzy  logic.  This  seminar  suggested  the 
possibility  of  Che  existence  of  a  scoring  rule  that  led  to  the  laws  of 
fuzzy  sets:  the  paper  shows  no  such  rule  exists.  The  observations  of 
Robert  Nau  on  a  first  draft  of  the  paper  have  been  of  considerable  value 
to  me. 

notation: 

He  consider  real  variables  X,Y,  ...  taking  values  x,y . 

Events  are  denoted  by  E,F,  ...  and  the  same  symbol  is  used  for  the 
indicator  variable  of  an  event,  so  that  E  ■  1  (0)  if  E  is  true  (false), 
f (X,E)  is  a  function  of  the  variables  X  and  E  :  f(x,l)  is  the  value 
of  that  function  when  X  takes  the  value  x  and  E  is  true.  f'(X,E) 
is  the  derivative  of  that  function  with  respect  to  X  . 

Score  Assumption: 

For  a  given  score  function  f (X,E)  ,  a  person  who  describes  his 

uncertainty  about  E  ,  conditional  on  F  ,  by  a  real  number  x  will 

receive  a  score  f(x,E)F  .  The  scores  are  additive  in  that  if  x^  refers 

to  E^  conditional  on  F^  for  1  -  1,2,  ...,  n  ,  then  the  total  score 

n 

for  all  these  descriptions  will  be  £  f (x  ,E  )F  . 

i-1  1  1 

We  consider  the  question  of  what  are  reasonable  values  for  him  to 
choose.  A  score  may  be  thought  of  as  a  reward  or  as  a  penalty.  For 
definiteness  we  shall  think  of  it  as  a  penalty .  so  that  the  person  wishes 
to  reduce  his  score. 
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Admissibility  Assumption: 

A  person  will  not  choose  values  x^  for  E^  conditional  on  F^ 

(i  ■  1,2,  . ...  n)  if  there  exist  values  y, ,y, . .  such  that 

i  i  n 

£if(yi’Ei)Fi  -  I1£(x1'E1)Fi  (1) 

for  all  values  of  the  Indicator  variables,  and  strict  inequality  holds  for 
some  values. 

If  the  conditions  do  obtain,  he  could  reduce  his  penalty  in  some  cir¬ 
cumstances  without  increasing  it  in  any.  In  statistical  language  the  set 
(x^^,  ...,  xq)  is  inadmissible  and  the  assumption  says  that  only 
admissible  values  will  be  selected. 

Origin  and  Scale  Assumption; 

For  the  uncertainty  of  E  (E)  conditional  on  E  ,  there  exists  a 
unique  admissible  value  (x^,)  the  same  for  all  E  ;  and  x^  +  Xp  . 

The  suffix  T  (F)  denotes  true  (false).  Without  loss  of  generality 
we  suppose  Xp  <  xT  . 

Regularity  Assumptions: 

X  can  assume  all  values  in  a  closed  Interval  I  of  the  real  line, 
f ' (X,E)  exists,  is  continuous  in  X  for  each  E  and,  for  both  E  ■  1 
and  E  ■  0  ,  vanishes  at  most  once.  Also  Xp  and  x^  are  interior 
points  of  I  . 

These  regularity  assumptions  are  unnecessarily  restrictive  and  are 


later  relaxed.  Our  reason  for  introducing  them  in  this  form  is  that  the 
proof  is  then  unencumbered  with  side-issues  that  might  otherwise  obscure 
the  argument.  We  first  prove  three  lemmas. 
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Lemma  1; 

All  values  In  Che  closed  interval  [x^.x^]  are  admissible,  and  values 
outside  are  inadmissible.  The  function 


P(x) 


_ f’fr.Q) 

f'(x.O)  -  f'(x.l) 


(2) 


satisfies  0  <_  P(x)  £  1  in  [xj,,x^]  ,  is  continuous  and  P(Xj.)  •  0  , 

P(Xj,)  -  1  .  In  particular  the  equation  in  x  ,  P(x)  ■  p  has  at  least 
one  admissible  solution  for  any  p  with  0  <_  p  <_  1  . 

Por  E  conditional  on  E  the  only  score  is  f(x,l)  .  By  the  origin 

and  scale  assumption,  x^  is  Che  unique  admissible  value  and  therefore 
minimizes  this  function.  By  the  regularity  assumption  f'Cx^.l)  ■  0  . 
Similarly  for  E  conditional  on  E  ,  f'(Xp,0)  *  0  .  Again  by  the 
regularity  assumption  f'(x,l)  >  (<)  0  for  x  >  (<)  x^.  and 
f’(x,0)  >  (<)  0  for  x  >  (<)  Xg,  :  in  particular,  f’Cx^.O)  >  0  and 
f’Cx^.l)  <  0  . 

For  E  conditional  on  F  the  score  will  be  f(x,l)  if  EF  *  1 

and  f(x,0)  if  (1  -  E)F  -  1  ,  and  otherwise  zero.  If  f'(x,l)  and 

f'(x,0)  are  both  strictly  positive  (negative)  x  is  inadmissible  since 
a  small  decrease  (increase)  in  x  will  reduce  both  scores.  Combining 
this  with  the  result  in  the  final  sentence  of  the  last  paragraph,  we  see 
chat  only  values  in  [x^.Xp]  are  admissible.  All  values  in  [xp,x^.] 
are  admissible  since  any  decrease  from  x  ,  although  it  will  lower 
f(x,0)  ,  will  necessarily  increase  f(x,l)  :  and  similarly  for  an  increase 
from  x  . 

The  properties  claimed  for  P(x)  all  easily  follow  from  the  results 


) 


already  established. 


Lemma  2: 

The  values  x  for  E  and  y  for  I  ,  both  conditional  on  F  , 
being  admissible  imply  P(x)  +  P(y)  ■  1  . 

The  total  scores  in  the  two  possible  cases  will  be: 

EF  -  1  f(x,l)  +  f(y,0)  ) 

(1  -  E)F  -  1  f(x,0)  +  f (y ,1)  .) 

Consider  small  changes  in  x  to  (x  +  h)  and  y  to  (y  +  k)  .  The 
resulting  changes  in  these  scores  will  be,  to  order  h  and  k  , 

f'(x,l)h  +  f'(y,0)k  ) 
and  f'(x,0)h  +  f ' (y,l)k  .) 

Both  these  changes  could  be  made  negative,  so  reducing  both  scores  and 
making  (x,y)  inadmissible,  by  solving  the  linear  equations  in  h  and 
k  obtained  by  equating  these  to  small,  selected,  negative  values.  The 
only  exception  to  this  occurs  when  the  determinant  of  the  linear  equations 
vanishes.  The  condition  for  this  is  that  f ' (x,l)f ' (y,l)  «  f ' (x,0)f ' (y,0) 
or  P(x)  +  P(y)  -  1  . 

This  argument  fails  at  boundary  points  because  the  values  of  h 
or  k  required  to  reduce  both  scores  may  not  be  permissible.  Consider 
the  case  x  *  Xj  where  h  <  0  and  f'(x,l)  *  0  .  If  y  #  x^,  ,  so 
that  f'(y,0)  >  0  ,  the  first  change  is  f'(y,0)k  and  for  this  to  be 
negative,  k  <  0  .  Since  f'(x,0)  >  0  ,  the  second  change  can  be  made 
negative.  Hence  x  *  x^  is  inadmissible  unless  y  *  Xp  when  the  first 
change  is  necessarily  positive  and  P(xT)  +  P(Xy)  -  1  .  Other  boundary 
values  follow  similarly. 
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Lemma  3: 

The  values,  x  for  F  conditional  on  G  ,  y  for  E  conditional 
on  FG  ,  and  z  for  EF  conditional  on  G  ,  being  admissible  implies 
P(z)  -  P(x)P(y)  . 

The  method  of  proof  follovs  that  of  Lemma  1.  The  total  scores  in 
Che  three  possible  cases  will  be: 

EFG  -  1  f (x ,  1)  +  f(y,l)  +  f(z,l)  'j 

(1  -  E)FG  -  1  f(x,l)  +  f (y ,0)  +  f (z,0)  > 

(1  -  F)G  -  1  f (x,0)  +  f (z,0)  .J 

Consider  small  changes  in  x  ,  y  and  z  ;  then  these  can  result  in 
changes  in  the  three  total  scores  that  are  all  negative,  so  making 
(x,y,z)  inadmissible,  unless  the  determinant  of  the  linear  equations 
is  zero.  Simple  calculation  establishes  that  the  determinant  is 

[f ’ (x,0)  -  f ' (x,l) ] [f 1 (y,0)  -  f ' (y,l) ] [f ' (z,0)  -  f ' (z,l) ] [P(x)P(y)  -P(z)]  . 

The  first  three  factors  do  not  vanish  by  results  established  in  the  proof 
of  Lemma  1.  Hence  Che  last  factor  vanishes,  as  required.  The  boundary 
values  require  special  consideration  as  in  Lemma  2;  details  are  omitted. 

Theorem: 

The  four  assumptions  listed  above  imply  that  the  values  x  describing 
uncertainty  will  be  such  that  the  transforms  P(x)  obey  the  laws  of 
probability. 

Lemma  1  establishes  the  convexity  property  chat  0  £  P(x)  <_  1 
and  P(xT)  *  0  .  Lemma  2  is  the  additive  property.  Lemma  3  is  the 
multiplicative  property. 
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The  theorem  states  that  admissibility  implies  probability,  through 
a  transform  of  the  stated  value,  but  not  the  converse.  To  consider  this, 
suppose  that  a  person  has  probability  p  for  E  and  considers  that  the 
relevant  quantity  is  his  expected  score 

pf(x,l)  +  (1  -  p)f (x,0)  . 

He  will  minimize  this  over  x  with  the  result  that  p  *  P(x)  ,  in  accord 
with  the  theorem.  The  same  argument  applies  in  the  circumstances  of 
Lemmas  2  and  3.  Minimization  of  expectation  gives  an  admissible  result, 
so  that  we  can  state  the 

Corollary : 

If  the  equation  in  x  ,  P(x)  *  p  ,  has  a  unique  root  for  all  0  £p  <  1  , 
then  all  scores  such  that  P(x)  obeys  the  rules  of  probability  are  ad¬ 
missible. 

If  P(x)  ■  p  has  a  unique  root,  we  shall  refer  to  the  scoring  rule 
as  single-valued .  If  it  has  multiple  roots  we  have  the  possibilities  of 
probability  rules  giving  inadmissible  values,  or  of  admissible  values  not 
obtained  through  minimization  of  an  expectation.  (Examples  below  show 
that  both  possibilities  can  occur.)  Our  next  result  enhances  the  status 
of  the  probability  transform  of  x  . 

Lemma  4: 

If,  in  considering  the  uncertainty  of  E  conditional  on  F  with 
score  function  f(X,E)F  ,  a  person  gives  x  ;  and  with  score  function 
g(Y,E)F  ,  gives  y  :  then  P(x)  -  Q(y)  . 

Here  Q(y)  -  g' (y,0) /{g' (y ,0)  -  g'(y,l)}  see  (2),  and  the  result 
says  that  if  the  score  function  is  changed  Che  probability  transform 
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is  invariant.  The  proof  uses  che  mechod  of  Lemmas  2  and  3.  The  first- 
order  changes  will  be 

ET  -  1  f’(x,l)h  +  g'(y,l)kj 

(1  -  E)F  -  1  f'(x,0)h  +  g'(y,0)kj 

and  the  determinant  necessarily  being  2ero  gives  P(x)  -  Q(y)  . 

It  follows  that  a  person  could  proceed  by  choosing  his  probability 
p  in  advance  of  knowing  what  score  function  was  to  be  used  and  then, 
when  it  was  announced,  providing  x  satisfying  P(x)  «  p  .  Robert  Nau 
has  pointed  out  to  me  that  in  che  proof  of  the  theorem  there  is  no  need 
for  the  score  function  to  be  the  same  for  each  event  considered:  each 
value  can  be  transformed  by  its  own  probability  transform  to  give  a 
probability  value.  The  next  result  shows  that  any  probability  transform 
is  possible. 

Lemma  5: 

For  any  function  P(x)  having  the  properties  described  in  Lemma  1, 

there  exists  a  score  function  with  P(x)  as  probability  transform. 

2 

For  example,  let  f(x,0)  *  (x  -  x^,)  ,  the  quadratic  function. 

Then  from  (2) 

f'(x,l)  -  2 (x  -  x?)[P(x)  -  1]/P(x) 

which,  with  the  boundary  condition  f'(xT,l)  »  0  ,  yields  a  solution 
for  f(x,l)  satisfying  the  regularity  conditions. 

In  all  the  discussion  so  far  the  only  ordering  of  scores  has  been 
based  on  admissibility  and  shows  that  the  probability-transforms  include 


all  admissible  values.  But  when  a  person  selects  a  value  x  to  describe 
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his  uncertainty  he  is  using  more  than  admissibility:  he  is  selecting  one 
value  out  of  all  admissible  ones.  In  particular,  in  the  case  of  multiple 
roots,  he  is  selecting  amongst  x-values  that  yield  the  same  probability. 

Our  next  assumptions  concerns  this  additional  ordering. 

Invariance  Assumption: 

Any  preferences  amongst  scores  do  not  depend  on  the  score  function 
being  used,  and  such  preferences  are  transitive. 

Theorem  2: 

The  five  assumptions  listed  above  imply  that  the  values  x  describ¬ 
ing  uncertainty  will  be  such  that  the  transforms  P(x)  obey  the  laws  of 
probability  and  conversely  that  any  x  may  be  attained  by  selecting 
probabilities  and  minimizing  the  expected  score. 

Only  the  second  part  requires  proof  (the  first  is  Thet-em  1)  and  we 
use  the  figure.  Here  the  axes  are  the  scores  f(x,l)  and  f(x,0)  and 
the  solid  curve  describes  these  coordinates  as  x  varies  from  x^  (at  F) 
to  Xp  (at  T)  .  (It  is  actually  the  curve  of  a  quartic  rule  to  be 
described  below,  but  will  serve  for  the  proof.)  This  curve  will  have 
slope  f ’ (x,0)/f ' (x,l)  ■  -P(x)/(1  -  P(x)J  ,  always  negative  and  varying 
continuously  from  zero  at  F  to  -®  at  T  .  By  the  remark  above,  any 
curve  will  these  properties  can  be  obtained  from  suitable  f's  .  The 
points  A  and  3  are  both  admissible  and  have  the  same  transforms 
P(x) — the  tangents  at  A  and  B  have  the  same  slope.  The  dotted  curve 
corresponds  to  another  score  function  which  is  single-valued  and  passes 
through  A  with  the  same  slope.  On  this  curve  there  clearly  exists  a 
point  C  with  both  scores  less  than  those  of  B  . 


f(x.O) 
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Now  consider  an  event  with  probability  p  .  With  the  single-valued 
score  function,  A  is  preferred  to  C  .  By  admissibility  C  is  pre¬ 
ferred  to  B  .  Hence,  by  the  invariance  assumption,  A  is  preferred 
to  B  with  the  original  score  function.  This  argument  is  available 
for  any  point,  like  A  ,  that  minimizes  the  expected  score,  and  the 
theorem  is  established. 

The  results  obtained  apply  only  to  score  functions  which  obey  both 
the  origin  and  scale,  and  regularity  assumptions.  The  former  is  not 
essential.  One  can  have  score  functions  with  several  minima  and,  in 
particular,  several  possible  descriptions  of  a  sure  event.  This  leads 
to  ambiguities  which  can  be  resolved  in  the  sense  that  they  all  lead  to 
the  same  probability,  namely  one,  after  transformation.  No  advantage 
seems  to  accrue  from  such  flexibility. 

The  regularity  assumptions  require  considerable  discussion.  The 
existence  and  continuity  of  the  derivatives  is  introduced  in  order  to 
avoid  abrupt  changes  in  the  score.  The  nonvanishing  of  the  derivatives, 
except  at  Xp  and  Xp  ,  is  a  slight  strengthening  of  Che  natural  require¬ 
ment  chat,  at  least  for  admissible  values,  the  score  function  does  not 
take  the  same  value  for  two  different  choices  x^  and  X£  ;  for  if  it 
did,  there  would  be  no  rationale  for  choosing  between  x^  and  x^  and 
again  there  would  be  ambiguity.  The  unnecessarily  severe  restriction  is 
that  Xp  and  x^  are  interior  points,  introduced  to  ensure  that  the 
minima  are  obtained  by  the  differential  calculus,  a  condition  that  need 
not  obtain  on  boundary  points.  We  consider  the  case  where  Xp  is  a 
boundary  point:  an  analogous  treatment  applies  at  Xp  . 

The  major  difference  now  is  that  we  do  not  necessarily  have 
f' (Xp,0)  *  0  .  Suppose  that  we  add  the  condition  that  lim  f'(x,0)/ 

X^+Xp 

f’(x,l)  ■  0  and  require  that  f'(x,0)  >  0  for  x  >  Xp  .  The  effect 
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of  Che  limit  condition  is  to  make  lim  P(x)  ■  0  .  It  is  then  Straight¬ 
s' 

forward  to  show  that  the  properties  of  P(x)  proved  in  Lemma  1  still 
obtain,  as  do  the  boundary  features  considered  in  Lemmas  2  and  3.  The 
curve  of  admissible  values  used  in  the  proof  of  Theorem  2  will  still  have 
zero  slope  at  Xp  and  the  argument  used  there  carries  over.  Consequently 
both  theorems  remain  true. 

We  therefore  restate  the 

Regularity  Assumptions: 

X  can  assume  all  values  in  a  closed  interval  I  of  the  real  line, 
f ' (X,£)  exists  and  is  continuous  in  X  for  each  E  .  For  x  >  (<)  Xp  , 
f’(x,0)  >  (<)  0  s  for  x  >  (<)  Xp  ,  f’(x,l)  >  (<)  0  .  Then  either  Xp 
is  an  interior  point  of  I  oj  lim  f 1 (x,0)/f 1 (x,l)  -  0  .  Also 

X“*4Xp 

either  Xp  is  interior  or  lim  f ’ (x,l)/f 1 (x,0)  ■  0  . 

r~xr 

Under  these  conditions  Theorem  1  and  2  persist. 

We  now  offer  several  miscellaneous  comments  on  the  results. 

1.  Throughout  the  discussion  we  have  referred  to  uncertainty  of  E 
conditional  on  F  because  conditional  assessment  is  the  general  form. 

If  the  person  knows  that  F  is  true  then  we  may  speak  of  the  uncertainty 
of  E  .  It  should  be  remembered  that  the  full  force  of  the  phrase 
"conditional  on  F"  is  "were  the  person  to  be  told  that  F  is  true". 

He  is  assessing  the  situation  now  and  scores  are  only  nonzero,  and  there¬ 
fore,  of  concern  to  him,  when  F  •  1  .  He  need  only  consider  the  case 
F  ■  1  but  does  not  need  to  know  that  F  -  1  . 


2.  A  scoring  rule  is  proper  if  it  lands  dirsccly  to  s  probability: 
that  is,  if  P(x)  •  x  or 

xf'(x,l)  -  (1  -  x)f ' (x,0)  . 

Tba  quadratic  rula  usad  by  Da  Finetti  has  f(x,l)  -  (x  -  l)2  and 
2 

f(x,0)  “  x  and  is  claarly  propar.  As  an  example  of  an  Improper  rula 

consider  f(x,l)  *  (x  -  l)4  and  f(x,0)  -  x4  ,  in  which  tha  fourth 

powers  replace  the  squares  of  the  proper  rula.  P(x)  Is  then 
3  2 

x  /(3x  -  3x  +  1)  and  P(x)  ■  p  is  a  cubic  in  x  with  a  unique  root 

x  for  any  0  <_  p  £  1  . 

The  quartic  rule,  suggested  to  me  by  Robert  Nau, 

i-,), 

provides  an  example  for  which  P(x)  »  p  has  multiple  roots  or  is  not 
single-valued.  Hare  x^,  »  -2  ,  x^  •  +2  ,  the  regularity  conditions  arc 
obeyed  with  these  as  interior  points  and 

P(x)  -  (x  +  2)(x  -  l)2/4 

a  cubic  with  three  roots  in  x  for  every  P  »  0  2.  P  <,  1  •  It  is  the 
scores  for  a  single  event  with  this  rule  that  are  graphed  in  the  figure. 
As  x  decreases  from  »  2  the  scores  move  from  T  along  the  curve 
to  the  point  a  when  x  ■  /3  .  These  points  lie  on  a  convex  part  of 
the  curve  and  can  be  obtained  by  minimizing  the  expected  score.  As  x 
decreases  further  the  curve  remains  convex  until  at  x  *  1  it  reaches 


the  point  b  ;  but  these  points,  though  admissible,  cannot  be  obtained 
by  a  minimization  of  tha  expected  score  and  are  dominated  by  points  near 
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F  (x  tur  Xy)  having  the  sum  tangent  slop*.  Bacvaan  x  *  1  and 

x  •  0  ,  whan  tha  curva  raacbaa  tha  origin,  cha  curva  la  concava  hue  cha 

valuaa  ara  still  admissible  chough  again  dominated  by  valuas  naar  Xy  . 

Tha  alcuaclon  rapaata  itself  bacvaan  cha  origin  and  F  with  -x  for  x 

2 

and  T  for  F  .  Only  valuaa  3  <_  x  <4  are  aadafaetory  and  can  ba 
obtained  by  ala Ini zing  cha  expected  score.  Bacvaan  -2  and  -/3  , 

P(x)  incraaaaa  aonoconically  fron  0  to  1/2  :  bacvaan  W3  and  W2 
it  sinilarly  paaaaa  from  1/2  co  1  .  Tha  acatad  value  haa  a  discon¬ 
tinuity  aa  p  paaaaa  through  1/2  .  Ic  ia  generally  crue  chat  cha 

2 

condition  for  curvaxlcy  la  P'(x)  >  0  :  chia  obealna  hare  with  1  <_  x  4 
Tha  reaaining  valuaa  |xj  <_  1  ,  give  points  on  cha  concava  part  of  cha 
curva. 

If  cha  seoraa  ara  plocced  for  E  and  E  (cf.  Leona  2)  chan  cha 
curve  P(x)  +  P(y)  •  1  again  givaa  cha  chraa  types  of  poincs  juat 
conaldarad — minimizing  an  axpaccad  score,  convex  buc  noc  obtained  by 
minimization,  concave— buc  also  poincs  vhich  ara  inadmissible.  These 
laccer  arise  whan  |xi  <_  1  and  y  ■  -x  . 

3.  Tha  regularity  assumptions  ara  all  obviously  reasonable  excepc 
Chose  on  cha  limits  ac  x^  and  Xy  when  they  ara  noc  interior  points. 
Consider  whac  happens  whan  chay  do  noc  hold,  specifically  suppose 
lim  f '  (x,0)/f '  (x,l)  <  0  ,  or  lim  P(x)  ■  a  >  0  .  This  iiapliaa 

X-^+Xy  X-+Xy 

x  must  ba  chosen  so  chat  P(x)  >_  a  or  is  zero.  Buc  Leona  3  shows  chat 

u 

chia  implies  P(x)  >_ a  ,  and  so  on.  Hence  all  valuaa  of  x  muse  ba 
such  chat  P(x)  la  0  or  1:  Chat  ia,  eha  only  admissible  valuas  are 
x  •  Xy  and  x  »  Xy  .  Such  score  functions  ara  trivial  in  chat  chay 
always  lead  Co  aaaareing  cha  truth  or  falsity  of  any  event ,  a  practice 
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which  is  encouraged  in  present-day  teaching  by  the  requirement  that  the 
pupil  is  always  expected  to  answer  from  a  dichotomy  "yes"  or  "no": 

"right"  or  "wrong". 

A  strange  scoring  rule  illustrating  this  is  the  square-root  rule 

with  f  (x,l)  ■  (1  -  x)*5  and  f(x,0)  -  x*4  for  0  <_  x  <_  1  .  Here  x^  ■  0  , 

2  2 

xT  ■  1  .  Since  f  (x,l)  +  f  (x,0)  ■  1  the  curve  of  admissible  values 
is  the  quarter  of  the  unit  circle  in  the  positive  quadrant  centered  at 
the  origin,  which  is  entirely  concave.  The  only  points  that  can  be  reached 
by  minimizing  an  expected  score  are  x^  and  x^  .  The  regularity  con¬ 
ditions  are  not  satisfied:  indeed  lim  f ' (x,0) /f ' (x,l)  is  infinite  and 

x-*—Xp 

P(x)  decreases  with  x  . 

4.  We  now  turn  to  scoring  rules  that  are  more  useful.  The  logarithmic 

form. 


f(x,l)  •  -log  x  and  f(x,0)  *  -log  (1  -  x)  , 

is  defined  only  in  [0,1]  .  It  is  proper  with  P(x)  *  x  .  The  hyperbolic 
form 

f(x,l)  -  x_1  and  f(x,0)  -  (1  -  x)-1 
is  also  only  defined  in  [0,1]  .  It  has 

P(x)  -  x2/{x2  +  (1  -  x)2} 

and  is  not  proper,  although  P(x)  ■  p  has  a  unique  root  in  x  for  each 
0  <  p  <  1  and  is  single-valued. 

As  an  example  of  rule  with  infinite  range  consider  the  exponential 


rule  with 
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f(x,l)  -  •"***  and  f(x,0)  -  a*1*  . 

Here  x^  ■  -*•  ,  x^  ■  +•  and  P(x)  ■  1/(1  +  a”*)  ranging  from  0  to  1. 

This  la  lapropar  but  navarthalaaa  a  possibly  uaaful  rula  In  that  it 
ancouragas  tha  parson  to  salact  x  corresponding  to  a  probability  where 

■  a 

P  •  1/(1  +  e  )  and  hanca  x  ■  log  {p/(l  -  p)}  .  In  other  words,  the 
values  announced  are  log-odds. 

The  rules  with 

f(x,l)  -  1  -  F^x)  and  f(x,0)  -  FQ(x) 

where  F^x)  are  distribution  functions  on  (“•,*•)  are  interesting 
because  they  are  bounded  both  above  and  below  and  are  defined  on  [-“,•]  . 

If  f^(x)  and  fQ(x)  are  the  corresponding  densities, 

P(x)  •  f 0(x)/{fg(x)  +  f1(x)>  .  Often  these  do  not  provide  acceptable 
rules  since  the  range  of  P(x)  is  not  the  full  unit  interval.  An 
extreme  case  arises  with  fQ(x)  “  f^(x)  when  P(x)  ■  ^  for  all  x 
and  only  4*  are  admissible:  see  comment  (12)  below.  If  f^(x) 
corresponds  to  N(l,%)  and  fg(x)  co  N(-l,b)  ,  then  P(x)  -  1/(1  +  e  x) 
and  we  are  back  to  a  log-odds  rule. 

5.  The  notion  of  admissibility  is  essentially  that  of  Pareto  optimality. 
One  way  of  expressing  the  result  of  this  paper  is  to  say  that  a  person  who 
accepts  Pareto  optimality  and  the  invariance  assumption,  and  who  then,  by 
some  unstated  process,  selects  a  unique  value  from  the  Pareto  set,  is 
effectively  introducing  probabilities  and  minimizing  an  expected  value. 

In  situations  where  the  single-valued  condition  does  not  obtain,  many 
of  Che  values  in  the  Pareto  set  are  ruled  out.  (Nau's  quartlc  rule  illus¬ 


trates  this.) 
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6.  The  considerations  of  chls  paper  have  considerable  practical 
import  besides  the  justification  of  tha  Bayesian  argument. 

Consider  a  geologist  who,  after  a  survey,  is  asked  to  express  his 
uncertainty  about  E  ,  the  existence  of  oil  at  a  site,  conditional  on 
the  result  F  of  the  survey.  Than  he  may  veil  see  the  position  in  terms  of 
implicit  score  functions  reflecting  the  dangers  of  giving  a  high  value, 
so  encouraging  drilling,  when  the  area  is  dry;  and  the  lesser  dangers  of 
giving  a  low  value  when  subsequent  drilling  reveals  oil.  It  would  not  be 
unreasonable  to  expect  that  the  implicit  score  function  was  improper  and 
that  he  will  therefore  be  motivated  to  give  x  rather  than  his  probability 
p  .  This  suggests  that  in  many  cases  attention  should  be  paid  to  the 
score  function  so  chat  the  stated  value  may  be  transformed  onto  the  prob¬ 
ability  scale.  If  tha  geologist  provides  several  assessments  then  informa¬ 
tion  about  the  transform,  and  hence  about  the  scores,  can  be  found  from 
the  known  probability  structure  of  the  transformed  value. 

It  may,  of  course,  happen  that  the  implicit  score  function  just 
referred  to  does  noc  obey  the  regularity  conditions.  In  which  case  the 
geologist  will  be  led  to  make  emphatic  statements  about  the  existence  of 
oil,  as  was  mentioned  in  comment  4. 

7.  We  now  consider  vays  of  assigning  numbers  to  uncertain  events 

that  have  been  suggested  in  the  literature  to  see  if  they  lead  to  admissible 
values  when  judged  by  any  scoring  rule.  For  a  real  parameter  9  ,  the 
method  of  (one-sided)  confidence  intervals  enables  a  number  to  be  attached 
to  the  event  Z  ,  chat  9  <  a  ,  conditional  on  F  ,  tha  data:  this  is 
the  confidence  that  9  <  a  and  we  write  cf(9  <  a  |  data)  .  Suppose 


cf(9  <  -1  |  data)  •  a  , 
cf(9  <  +1  j  data)  »  S 


(3) 
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and 

cf(9  <  -1  |  data, 9  <  +1)  »  y  .  (4) 

Then,  if  the  confidence  method  is  admissible,  we  must  be  able  to  find  P(») 
such  that  P(a)  -  P(S)P(y)  :  this  follows  from  Lemma  3.  But  since  the 
confidence  statement  (3)  is  derived  from  a  probability  statement  valid 
for  all  9  ,  the  restriction  to  9  <  +1  in  (4)  makes  no  difference  to 
the  validity  of  the  statement  and  hence  y  ■  a  .  Consequently 
P(a)  »  P(0)P(at)  and  either  P(8)  ■  1  or  P(a)  *  0  .  Hence  there  is 
no  transform  of  a  confidence  statement  to  a  probability  statement  and 
the  confidence  values  are  inadmissible. 

8.  Another  way  of  assigning  numbers  is  through  significance  tests. 

Let  data  x  have  an  exponential  distribution  with  density  9e  , 
x  ^  0  ,  8  >  0  .  To  test  the  hypothesis  that  9  ■  u  ,  against  the  alter¬ 
native  9  i  u)  ,  when  x  is  unexpectedly  large  on  hypothesis  w  ,  the 
"tail"  of  the  null  distribution  is  used: 


P(X  >  x  !  in) 


and 

sg(w  !  x)  -  e_,JiX 

is  the  significance  attached  to  the  event  E  that  9  »  u>  ,  given  F  , 
the  data.  If  x  is  small,  the  other  tall  is  used  and  sg(w  |  x)  •  1  -  e 
Hence  for  all  x 


sg(w  |  x)  *  min  {e~u,x  ,  1  -  e~“x}  . 
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For  chis  Co  correspond  Co  a  scoring  rule,  there  must  exist  a  transform 
?(*)  of  these  values  to  nonnegative  values  with  the  Integral  over  all 
u>  equal  to  unity:  this  Is  the  addition  rule  of  probability.  But  the 
significance  value  depends  only  on  ux  ,  so  J*F(wx)dui  -  1  .  Let  ux  -  u  , 
then  J*P(u)du  *  x  ^  for  all  x  ,  which  is  impossible.  Hence  significance 
statements  are  inadmissible. 

9.  The  discussion  in  (7)  and  (8)  of  confidence  and  significance  state¬ 
ments  is  based  on  my  personal  understanding  of  these  methods.  That  under¬ 
standing  may  be  defective  because  the  methods  are  not  unambiguously 
described.  For  example,  in  (7),  is  the  result  that  led  to  y  -  a  correct? 
Is  a  confidence  statement  altered  if  the  parametric  range  is  restricted? 

The  discussion  of  significance  levels  in  (8)  is  similarly  bedevilled  by 

the  ambiguity  over  whether  one-or  two-sided  tests  are  appropriate:  we 
have  used  only  the  one-sided  form.  It  is  my  conviction  that  both  these 
methods  are  inadmissible  because  they  violate  the  likelihood  principle, 
that  easily  follows  from  the  probabilistic  description  of  uncertainty. 

10.  Another  way  of  assigning  numbers  to  uncertain  events  has  been 
suggested  by  Zadeh  (1979).  These  numbers  are  called  possibilities.  Let 
all  statements  be  conditional  on  the  same  event  not  described  in  the 
notation.  Then  the  possibilities  n(E)  for  events  E  satisfy  the  rule 
of  combination 

It(E  U  F)  -  max  {H(E),II(F)}  . 

This  is  in  conflict  with  the  corresponding  probability  rule 


p(E  UF)  ■  p(E)  +  p(F)  -  p(E  n  F) 
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which  Is  a  linear  operator  on  Che  statement  for  indicator  variables 
1  -  (1  -  E)(l  -  F)  -  E  +  F  -  EF  . 

The  possibility  relation  being  nonlinear  cannot  be  transformed  and  hence 
possibilities  are  inadmissible. 

11.  An  extension  of  the  idea  of  using  a  single  number  to  describe 
the  uncertainty  of  E  conditional  on  F  is  to  use  two  values,  x^  ,  x2  . 
They  are  sometimes  called  upper  and  lower  probabilities.  To  score  these, 
one  might  use  a  function  f(x2>x2>E)F  .  Consider  applying  the  admissibility 
ideas  here.  (We  omit  the  details  which  parallel  those  given  above.) 

With  (x.,x2)  stated  for  E  conditional  on  F  ,  the  scores  are 

EF  -  1  f(x1,x2,l) 

(1  -  E)F  -  1  f (x^,x2»0)  . 

As  before  consider  small  changes  5x^  ,  Sx2  in  the  values.  Then  the 
score  changes  will  be 

f^(x^,x2>l)  <5x^  +  f2(x^,x2,l)5x2 
and  f1(x1,x2>0)5x1  +  f2(x1>x2>0)5x2 

where  f^  denotes  the  derivative  with  respect  to  the  ith  argument. 

For  admissibility  the  determinant  must  vanish.  This  determinant  is  equal 
to  the  Jacobian  of  the  transformation  from  (x^,x2)  to  (f,g)  .  If  it 
vanishes  everywhere,  the  functions  f  and  g  assume  constant  values 
on  the  same  curve  in  the  (x^,x2)-plane,  so  that  there  is  no  reason  to 
choose  between  different  values  on  the  curve  and  the  subject  is  effectively 
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only  using  one  number  (that  describes  which  curve),  rather  than  two,  to 
measure  his  uncertainty. 

If  the  Jacobian  does  not  vanish  everywhere  then  the  values  of  (x^^) 
are  confined  to  the  curve  where  it  does  vanish,  namely  where 

f^x^.x^.l)  f1(x1,x2,0) 

f^x^.Xj.l)  f^Cx^.XjjO)  '  ^ 

Call  this  common  value  h(x^,x2)  •  Then  again,  in  effect,  the  subject 
is  only  providing  a  single  number  describing  his  position  on  that  curve. 

For  example,  suppose  (x^.Xj)  is  given  for  E  ,  and  (y^yj)  E  , 

both  conditional  on  F  .  This  is  the  situation  comparable  to  that  in 
Lemma  2  and  the  total  scores  are 

EF  -  1  f(x1,x2,l)  +  f(y1,y2>0)  ) 

and  (1  -  E)F  -  1  ftx^x^O)  +  f(y1,y2,l)  .) 

The  changes  in  scores,  resulting  from  changes  (6x2,6x2>  in  (x^.Xj) 
and  (dy^.dyj)  in  (y^^)  ,  will  be  on  utilizing  (5), 

f2(x1,x2,l) [h(x1,x2)5x1  +  <5x2J  +  f2(y1,y2>0) [h(y1,y2)dy1  +  $y2] 
f2(x1,x2,0)[h(x1,x2)fix1  +  5x2J  +  f2(yi,y2,l)[h(y1,y2)5y1  +  Sy2]  . 

The  vanishing  of  the  determinant  gives 

f2(x1,x2,l)f2(yL,y2,l)  -  f2(x1,x2,0)f2(y1,y2,0)  , 


i 
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chac  P(x2,x2)  +  P(y1>y2)  ■  1  and  we  are  back  eo  the  addition  rule  for 
probabilities.  The  product  rule  follows  similarly. 

This  does  not  close  the  book  on  the  idea  of  using  two  or  more  numbers 
to  describe  uncertainty,  for  it  might  be  reasonable  to  use  two  or  more 
score  functions,  measuring  different  qualities  of  the  descriptions  in  the 
manner  of  a  multiattribute  utility  function. 

12.  The  argument  of  Shafer  (1976)  is  affected  by  the  scoring-rule 
criterion.  He  suggests,  in  the  situation  of  Lemma  2,  that  any  values, 
x  for  E  ,  y  for  E  ,  could  be  used  subject  only  to  the  requirements 
that  x>_0,y>_0,x  +  y<_l.  Such  numbers  are  possible  values  for 
a  belief  function.  But  Lemma  2  shows  that  P(x)  +  P(y)  *  1  and  hence 
Che  only  scoring  rule  to  make  all  Shafer's  values  admissible  has 

P(x)  ■  h  ,  or  f(x,0)  +  f(x,l)  -  constant.  But  this  contradicts  the 

product  rule  in  Lemma  3.  Alternatively  P(x)  -  means  chat  f’(x,0)  - 

-f'(x,l)  and  hence  lim  f ' (x,0)/f ’ (x,l)  -  -1  in  contradiction  of  the 

x-'-hCp 

regularity  condition. 

13.  Notice  that  in  the  score  assumption  we  have  supposed  that  n  , 
the  number  of  events  judged,  is  finite.  The  infinite  case  causes 
difficulties  due  to  the  possible  divergence  of  the  series  describing  the 
total  scores.  As  a  result  we  have  only  established  the  addition  rule 
for  a  finite  number  of  events  and  the  resulting  probability  is  only 

f initely-additive  and  not  a-additive.  We  have  been  unable  to  see  how, 
or  even  if  it  is  possible,  to  extend  the  notion  of  a  score  to  an  enumerable 


l 


infinity  of  statements. 
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